A Lempel-Ziv Compressed Structure for Document Listing

نویسندگان

Hector Ferrada

Gonzalo Navarro

چکیده

Document listing is the problem of preprocessing a set of sequences, called documents, so that later, given a short string called the pattern, we retrieve the documents where the pattern appears. While optimal-time and linear-space solutions exist, the current emphasis is in reducing the space requirements. Current document listing solutions build on compressed suffix arrays. This paper is the first attempt to solve the problem using a Lempel-Ziv compressed index of the text collections. We show that the resulting solution is very fast to output most of the resulting documents, taking more time for the final ones. This makes this index particularly useful for interactive scenarios or when listing some documents is sufficient. Yet, it also offers a competitive space/time tradeoff when returning the full answers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Definability and Compression

A compression algorithm takes a finite structure of a class K as input and produces a finite structure of a different class K’ as output. Given a property P on the class K defined in a logicL, we study the definability of property P on the class K’. We consider two compression schemas on unary ordered structures (words), compression by runlength encoding and the classical Lempel-Ziv. First-orde...

متن کامل

Deenability and Compression

A compression algorithm takes a nite structure of a class K as input and produces a nite structure of a diierent class K' as output. Given a property P on the class K deened in a logic L, we study the deenability of property P on the class K'. We consider two compression schemas on unary ordered structures (words), a naive compression and the classical Lempel-Ziv. First-order properties of stri...

متن کامل

A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text

We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is essential to have compressed text databases where eecient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts th...

متن کامل

Efficient Compressed Indexing for Approximate Top-k String Retrieval

Given a collection of strings (called documents), the top-k document retrieval problem is that of, given a string pattern p, finding the k documents where p appears most often. This is a basic task in most information retrieval scenarios. The best current implementations require 20–30 bits per character (bpc) and k to 4k microseconds per query, or 12–24 bpc and 1–10 milliseconds per query. We i...

متن کامل

A Unifying Framework for Compressed Pattern Matching

We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW), byte-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

A Lempel-Ziv Compressed Structure for Document Listing

نویسندگان

چکیده

منابع مشابه

Definability and Compression

Deenability and Compression

A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text

Efficient Compressed Indexing for Approximate Top-k String Retrieval

A Unifying Framework for Compressed Pattern Matching

عنوان ژورنال:

اشتراک گذاری